53 research outputs found
Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks
Existing question answering methods infer answers either from a knowledge
base or from raw text. While knowledge base (KB) methods are good at answering
compositional questions, their performance is often affected by the
incompleteness of the KB. Au contraire, web text contains millions of facts
that are absent in the KB, however in an unstructured form. {\it Universal
schema} can support reasoning on the union of both structured KBs and
unstructured text by aligning them in a common embedded space. In this paper we
extend universal schema to natural language question answering, employing
\emph{memory networks} to attend to the large body of facts in the combination
of text and KB. Our models can be trained in an end-to-end fashion on
question-answer pairs. Evaluation results on \spades fill-in-the-blank question
answering dataset show that exploiting universal schema for question answering
is better than using either a KB or text alone. This model also outperforms the
current state-of-the-art by 8.5 points.\footnote{Code and data available
in \url{https://rajarshd.github.io/TextKBQA}}Comment: ACL 2017 (short
Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes
High-quality dialogue-summary paired data is expensive to produce and
domain-sensitive, making abstractive dialogue summarization a challenging task.
In this work, we propose the first unsupervised abstractive dialogue
summarization model for tete-a-tetes (SuTaT). Unlike standard text
summarization, a dialogue summarization method should consider the
multi-speaker scenario where the speakers have different roles, goals, and
language styles. In a tete-a-tete, such as a customer-agent conversation, SuTaT
aims to summarize for each speaker by modeling the customer utterances and the
agent utterances separately while retaining their correlations. SuTaT consists
of a conditional generative module and two unsupervised summarization modules.
The conditional generative module contains two encoders and two decoders in a
variational autoencoder framework where the dependencies between two latent
spaces are captured. With the same encoders and decoders, two unsupervised
summarization modules equipped with sentence-level self-attention mechanisms
generate summaries without using any annotations. Experimental results show
that SuTaT is superior on unsupervised dialogue summarization for both
automatic and human evaluations, and is capable of dialogue classification and
single-turn conversation generation
Improving Dual-Encoder Training through Dynamic Indexes for Negative Mining
Dual encoder models are ubiquitous in modern classification and retrieval.
Crucial for training such dual encoders is an accurate estimation of gradients
from the partition function of the softmax over the large output space; this
requires finding negative targets that contribute most significantly ("hard
negatives"). Since dual encoder model parameters change during training, the
use of traditional static nearest neighbor indexes can be sub-optimal. These
static indexes (1) periodically require expensive re-building of the index,
which in turn requires (2) expensive re-encoding of all targets using updated
model parameters. This paper addresses both of these challenges. First, we
introduce an algorithm that uses a tree structure to approximate the softmax
with provable bounds and that dynamically maintains the tree. Second, we
approximate the effect of a gradient update on target encodings with an
efficient Nystrom low-rank approximation. In our empirical study on datasets
with over twenty million targets, our approach cuts error by half in relation
to oracle brute-force negative mining. Furthermore, our method surpasses prior
state-of-the-art while using 150x less accelerator memory.Comment: To appear at AISTATS 202
Compressed Video Action Recognition
Training robust deep video representations has proven to be much more
challenging than learning deep image representations. This is in part due to
the enormous size of raw video streams and the high temporal redundancy; the
true and interesting signal is often drowned in too much irrelevant data.
Motivated by that the superfluous information can be reduced by up to two
orders of magnitude by video compression (using H.264, HEVC, etc.), we propose
to train a deep network directly on the compressed video.
This representation has a higher information density, and we found the
training to be easier. In addition, the signals in a compressed video provide
free, albeit noisy, motion information. We propose novel techniques to use them
effectively. Our approach is about 4.6 times faster than Res3D and 2.7 times
faster than ResNet-152. On the task of action recognition, our approach
outperforms all the other methods on the UCF-101, HMDB-51, and Charades
dataset.Comment: CVPR 2018 (Selected for spotlight presentation
- …